94 research outputs found

    Towards zero-shot language modeling

    Get PDF
    Can we construct a neural language model which is inductively biased towards learning human language? Motivated by this question, we aim at constructing an informative prior for held-out languages on the task of character-level, open-vocabulary language modeling. We obtain this prior as the posterior over network weights conditioned on the data from a sample of training languages, which is approximated through Laplace’s method. Based on a large and diverse sample of languages, the use of our prior outperforms baseline models with an uninformative prior in both zero-shot and few-shot settings, showing that the prior is imbued with universal linguistic knowledge. Moreover, we harness broad language-specific information available for most languages of the world, i.e., features from typological databases, as distant supervision for held-out languages. We explore several language modeling conditioning techniques, including concatenation and meta-networks for parameter generation. They appear beneficial in the few-shot setting, but ineffective in the zero-shot setting. Since the paucity of even plain digital text affects the majority of the world’s languages, we hope that these insights will broaden the scope of applications for language technology

    Specializing distributional vectors of allwords for lexical entailment

    Get PDF
    Semantic specialization methods fine-tune distributional word vectors using lexical knowledge from external resources (e.g., WordNet) to accentuate a particular relation between words. However, such post-processing methods suffer from limited coverage as they affect only vectors of words seen in the external resources. We present the first postprocessing method that specializes vectors of all vocabulary words – including those unseen in the resources – for the asymmetric relation of lexical entailment (LE) (i.e., hyponymyhypernymy relation). Leveraging a partially LE-specialized distributional space, our POSTLE (i.e., post-specialization for LE) model learns an explicit global specialization function, allowing for specialization of vectors of unseen words, as well as word vectors from other languages via cross-lingual transfer. We capture the function as a deep feedforward neural network: its objective re-scales vector norms to reflect the concept hierarchy while simultaneously attracting hyponymyhypernymy pairs to better reflect semantic similarity. An extended model variant augments the basic architecture with an adversarial discriminator. We demonstrate the usefulness and versatility of POSTLE models with different input distributional spaces in different scenarios (monolingual LE and zero-shot cross-lingual LE transfer) and tasks (binary and graded LE). We report consistent gains over state-of-the-art LE-specialization methods, and successfully LE-specialize word vectors for languages without any external lexical knowledge

    On the relation between linguistic typology and (limitations of) multilingual language modeling

    Get PDF
    A key challenge in cross-lingual NLP is developing general language-independent architectures that are equally applicable to any language. However, this ambition is largely hampered by the variation in structural and semantic properties, i.e. the typological profiles of the world's languages. In this work, we analyse the implications of this variation on the language modeling (LM) task. We present a large-scale study of state-of-the art n-gram based and neural language models on 50 typologically diverse languages covering a wide variety of morphological systems. Operating in the full vocabulary LM setup focused on word-level prediction, we demonstrate that a coarse typology of morphological systems is predictive of absolute LM performance. Moreover, fine-grained typological features such as exponence, flexivity, fusion, and inflectional synthesis are borne out to be responsible for the proliferation of low-frequency phenomena which are organically difficult to model by statistical architectures, or for the meaning ambiguity of character n-grams. Our study strongly suggests that these features have to be taken into consideration during the construction of next-level language-agnostic LM architectures, capable of handling morphologically complex languages such as Tamil or Korean.ERC grant Lexica

    Adversarial propagation and zero-shot cross-lingual transfer of word vector specialization

    Get PDF
    Semantic \specialization is a process of fine-tuning pre-trained distributional word vectors using external lexical knowledge (e.g., WordNet) to accentuate a particular semantic relation in the specialized vector space. While post-processing specialization methods are applicable to arbitrary distributional vectors, they are limited to updating only the vectors of words occurring in external lexicons (i.e., seen words), leaving the vectors of all other words unchanged. We propose a novel approach to specializing the full distributional vocabulary. Our adversarial post-specialization method propagates the external lexical knowledge to the full distributional space. We exploit words seen in the resources as training examples for learning a global specialization function. This function is learned by combining a standard L2-distance loss with a adversarial loss: the adversarial component produces more realistic output vectors. We show the effectiveness and robustness of the proposed method across three languages and on three tasks: word similarity, dialog state tracking, and lexical simplification. We report consistent improvements over distributional word vectors and vectors specialized by other state-of-the-art specialization frameworks. Finally, we also propose a cross-lingual transfer method for zero-shot specialization which successfully specializes a full target distributional space without any lexical knowledge in the target language and without any bilingual data

    Cross-lingual semantic specialization via lexical relation induction

    Get PDF
    Semantic specialization integrates structured linguistic knowledge from external resources (such as lexical relations in WordNet) into pretrained distributional vectors in the form of constraints. However, this technique cannot be leveraged in many languages, because their structured external resources are typically incomplete or non-existent. To bridge this gap, we propose a novel method that transfers specialization from a resource-rich source language (English) to virtually any target language. Our specialization transfer comprises two crucial steps: 1) Inducing noisy constraints in the target language through automatic word translation; and 2) Filtering the noisy constraints via a state-of-the-art relation prediction model trained on the source language constraints. This allows us to specialize any set of distributional vectors in the target language with the refined constraints. We prove the effectiveness of our method through intrinsic word similarity evaluation in 8 languages, and with 3 downstream tasks in 5 languages: lexical simplification, dialog state tracking, and semantic textual similarity. The gains over the previous state-of-art specialization methods are substantial and consistent across languages. Our results also suggest that the transfer method is effective even for lexically distant source-target language pairs. Finally, as a by-product, our method produces lists of WordNet-style lexical relations in resource-poor languages

    Decoding sentiment from distributed representations of sentences

    Get PDF
    Distributed representations of sentences have been developed recently to represent their meaning as real-valued vectors. However, it is not clear how much information such representations retain about the polarity of sentences. To study this question, we decode sentiment from unsupervised sentence representations learned with different architectures (sensitive to the order of words, the order of sentences, or none) in 9 typologically diverse languages. Sentiment results from the (recursive) composition of lexical items and grammatical strategies such as negation and concession. The results are manifold: we show that there is no `one-size-fits-all' representation architecture outperforming the others across the board. Rather, the top-ranking architectures depend on the language and data at hand. Moreover, we find that in several cases the additive composition model based on skip-gram word vectors may surpass supervised state-of-art architectures such as bidirectional LSTMs. Finally, we provide a possible explanation of the observed variation based on the type of negative constructions in each language

    Hypomelanosis of Ito with a trisomy 2 mosaicism: a case report

    Get PDF
    Introduction: Hypomelanosis of Ito is a rare neurocutaneous disorder, characterized by streaks and swirls of hypopigmentation following the lines of Blaschko that may be associated to systemic abnormalities involving the central nervous system and musculoskeletal system. Despite the preponderance of reported sporadic hypomelanosis of Ito, few reports of familial hypomelanosis of Ito have been described. Case presentation: A 6-month-old Caucasian girl presented with unilateral areas of hypomelanosis distributed on the left half of her body and her father presented with similar mosaic hypopigmented lesions on his upper chest. Whereas both blood karyotypes obtained from peripheral lymphocyte cultures were normal, a 16% trisomy 2 mosaicism was found in cultured skinfibroblasts derived from a hypopigmented skin area of her father. Conclusions: Familial cases of hypomelanosis of Ito are very rare and can occur in patients without systemic involvement. Hypomelanosis of Ito constitutes a non-specific diagnostic definition including different clinical entities with a wide phenotypic variability, either sporadic or familial. Unfortunately, a large number of cases remain misdiagnosed due to both diagnostic challenges and controversial issues on cutaneous biopsies in the pediatric population

    Prostate Cancer Cell Lines under Hypoxia Exhibit Greater Stem-Like Properties

    Get PDF
    Hypoxia is an important environmental change in many cancers. Hypoxic niches can be occupied by cancer stem/progenitor-like cells that are associated with tumor progression and resistance to radiotherapy and chemotherapy. However, it has not yet been fully elucidated how hypoxia influences the stem-like properties of prostate cancer cells. In this report, we investigated the effects of hypoxia on human prostate cancer cell lines, PC-3 and DU145. In comparison to normoxia (20% O2), 7% O2 induced higher expressions of HIF-1α and HIF-2α, which were associated with upregulation of Oct3/4 and Nanog; 1% O2 induced even greater levels of these factors. The upregulated NANOG mRNA expression in hypoxia was confirmed to be predominantly retrogene NANOGP8. Similar growth rates were observed for cells cultivated under hypoxic and normoxic conditions for 48 hours; however, the colony formation assay revealed that 48 hours of hypoxic pretreatment resulted in the formation of more colonies. Treatment with 1% O2 also extended the G0/G1 stage, resulting in more side population cells, and induced CD44 and ABCG2 expressions. Hypoxia also increased the number of cells positive for ABCG2 expression, which were predominantly found to be CD44bright cells. Correspondingly, the sorted CD44bright cells expressed higher levels of ABCG2, Oct3/4, and Nanog than CD44dim cells, and hypoxic pretreatment significantly increased the expressions of these factors. CD44bright cells under normoxia formed significantly more colonies and spheres compared with the CD44dim cells, and hypoxic pretreatment even increased this effect. Our data indicate that prostate cancer cells under hypoxia possess greater stem-like properties

    The Role of Actin Turnover in Retrograde Actin Network Flow in Neuronal Growth Cones

    Get PDF
    The balance of actin filament polymerization and depolymerization maintains a steady state network treadmill in neuronal growth cones essential for motility and guidance. Here we have investigated the connection between depolymerization and treadmilling dynamics. We show that polymerization-competent barbed ends are concentrated at the leading edge and depolymerization is distributed throughout the peripheral domain. We found a high-to-low G-actin gradient between peripheral and central domains. Inhibiting turnover with jasplakinolide collapsed this gradient and lowered leading edge barbed end density. Ultrastructural analysis showed dramatic reduction of leading edge actin filament density and filament accumulation in central regions. Live cell imaging revealed that the leading edge retracted even as retrograde actin flow rate decreased exponentially. Inhibition of myosin II activity before jasplakinolide treatment lowered baseline retrograde flow rates and prevented leading edge retraction. Myosin II activity preferentially affected filopodial bundle disassembly distinct from the global effects of jasplakinolide on network turnover. We propose that growth cone retraction following turnover inhibition resulted from the persistence of myosin II contractility even as leading edge assembly rates decreased. The buildup of actin filaments in central regions combined with monomer depletion and reduced polymerization from barbed ends suggests a mechanism for the observed exponential decay in actin retrograde flow. Our results show that growth cone motility is critically dependent on continuous disassembly of the peripheral actin network
    • …
    corecore